Search CORE

3 research outputs found

Phoneme duration modelling for speaker verification

Author: Van Heerden Charl Johannes
Publication venue: 'University of Pretoria - Department of Philosophy'
Publication date: 26/06/2009
Field of study

Higher-level features are considered to be a potential remedy against transmission line and cross-channel degradations, currently some of the biggest problems associated with speaker verification. Phoneme durations in particular are not altered by these factors; thus a robust duration model will be a particularly useful addition to traditional cepstral based speaker verification systems. In this dissertation we investigate the feasibility of phoneme durations as a feature for speaker verification. Simple speaker specific triphone duration models are created to statistically represent the phoneme durations. Durations are obtained from an automatic hidden Markov model (HMM) based automatic speech recognition system and are modeled using single mixture Gaussian distributions. These models are applied in a speaker verification system (trained and tested on the YOHO corpus) and found to be a useful feature, even when used in isolation. When fused with acoustic features, verification performance increases significantly. A novel speech rate normalization technique is developed in order to remove some of the inherent intra-speaker variability (due to differing speech rates). Speech rate variability has a negative impact on both speaker verification and automatic speech recognition. Although the duration modelling seems to benefit only slightly from this procedure, the fused system performance improvement is substantial. Other factors known to influence the duration of phonemes are incorporated into the duration model. Utterance final lengthening is known be a consistent effect and thus “position in sentence” is modeled. “Position in word” is also modeled since triphones do not provide enough contextual information. This is found to improve performance since some vowels’ duration are particularly sensitive to its position in the word. Data scarcity becomes a problem when building speaker specific duration models. By using information from available data, unknown durations can be predicted in an attempt to overcome the data scarcity problem. To this end we develop a novel approach to predict unknown phoneme durations from the values of known phoneme durations for a particular speaker, based on the maximum likelihood criterion. This model is based on the observation that phonemes from the same broad phonetic class tend to co-vary strongly, but that there is also significant cross-class correlations. This approach is tested on the TIMIT corpus and found to be more accurate than using back-off techniques.Dissertation (MEng)--University of Pretoria, 2009.Electrical, Electronic and Computer Engineeringunrestricte

UPSpace at the University of Pretoria

Efficient training of support vector machines and their hyperparameters

Author: Van Heerden Charl Johannes
Publication venue
Publication date: 01/01/2012
Field of study

Thesis (Ph.D. (Computer Engineering))--North-West University, Potchefstroom Campus, 2012As digital computers become increasingly powerful and ubiquitous, there is a growing need for pattern-recognition algorithms that can handle very large data sets. Support vector machines (SVMs), which are generally viewed as the most accurate classifiers for generalpurpose pattern recognition, are somewhat problematic in this respect: as for all classifiers which employ hyperparameters, the behavior of SVMs depends strongly on the particular choice of hyperparameter values, and popular approaches to training SVMs require computationally expensive grid searches to choose these parameters appropriately [1, 2]. Our main objective is therefore to find more efficient ways to train SVM hyperparameters. We also show that for non-separable datasets, SVMs do not behave like large margin classifiers. This observation in turn leads us to explore algorithms which do not employ a margin term. Since one of the hyperparameters of SVMs is a regularization parameter that controls the relative contribution of the margin term and the sum of misclassifications, dropping the margin term means that there is one less hyperparameter to be trained. Grid searches are an expensive yet widely used technique to train the SVM hyperparameters. We therefore investigate ways in which the hyperparameters can be trained more efficiently, since the traditional grid search approach to finding good parameters takes very long. We also investigate alternative algorithms which are similar to SVMs, but which have fewer hyperparameters to find. With this goal in mind, we first investigate the scaling and asymptotic behaviors of popular SVM hyperparameters on non-separable datasets. We find that the scale factor of the radial basis function (RBF) kernel depends only weakly on the size of the training set and that the regularization parameter C must assume relatively large values for accurate classification to be achieved. The observation with regard to C is true for all datasets considered in the thesis when a linear kernel is employed, while for RBF kernels the evidence is not as strong. The preference for large C casts doubt on the large margin classifier (LMC) tag often associated with SVMs, especially with linear kernels. Further investigation confirms our suspicion that minimization of an error term, rather than maximization of the inter-class margin, is responsible for the widely acknowledged excellence of SVM classifiers. These insights suggest two different approaches to reducing overall SVM training time: SVM hyperparameter training on reduced training sets and stochastic optimization of a simplified criterion function. The SVM hyperparameter training on reduced training sets is further enhanced by a heuristic for the choice of the RBF scale factor. This enables us to propose a hyperparameter selection algorithm that performs as well as the conventional SVM approach on all classification problems considered in this thesis, while reducing the required training time by several orders of magnitude. Our second approach, stochastic optimization of a simplified criterion, is slightly less accurate on some problems, but reduces the overall training time even further. With training sets consisting of tens of thousands of samples, efficient hyperparameter selection for standard SVMs is the method of choice. Looking to the future where training-set sizes will inevitably continue to increase, methods such as our stochastic approach will become preferable for a growing proportion of practical problems.Doctora

North-West University Institutional Repository